26 research outputs found
Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition
Traditional language models are unable to efficiently model entity names
observed in text. All but the most popular named entities appear infrequently
in text providing insufficient context. Recent efforts have recognized that
context can be generalized between entity names that share the same type (e.g.,
\emph{person} or \emph{location}) and have equipped language models with access
to an external knowledge base (KB). Our Knowledge-Augmented Language Model
(KALM) continues this line of work by augmenting a traditional model with a KB.
Unlike previous methods, however, we train with an end-to-end predictive
objective optimizing the perplexity of text. We do not require any additional
information such as named entity tags. In addition to improving language
modeling performance, KALM learns to recognize named entities in an entirely
unsupervised way by using entity type information latent in the model. On a
Named Entity Recognition (NER) task, KALM achieves performance comparable with
state-of-the-art supervised models. Our work demonstrates that named entities
(and possibly other types of world knowledge) can be modeled successfully using
predictive learning and training on large corpora of text without any
additional information.Comment: NAACL 2019; updated to cite Zhou et al. (2018) EMNLP as a piece of
related wor
Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models
Pre-trained masked language models successfully perform few-shot learning by
formulating downstream tasks as text infilling. However, as a strong
alternative in full-shot settings, discriminative pre-trained models like
ELECTRA do not fit into the paradigm. In this work, we adapt prompt-based
few-shot learning to ELECTRA and show that it outperforms masked language
models in a wide range of tasks. ELECTRA is pre-trained to distinguish if a
token is generated or original. We naturally extend that to prompt-based
few-shot learning by training to score the originality of the target options
without introducing new parameters. Our method can be easily adapted to tasks
involving multi-token predictions without extra computation overhead. Analysis
shows that ELECTRA learns distributions that align better with downstream
tasks.Comment: Accepted to EMNLP 2022; The code is available at
https://github.com/facebookresearch/ELECTRA-Fewshot-Learnin
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Contrastively trained vision-language models have achieved remarkable
progress in vision and language representation learning, leading to
state-of-the-art models for various downstream multimodal tasks. However,
recent research has highlighted severe limitations of these models in their
ability to perform compositional reasoning over objects, attributes, and
relations. Scene graphs have emerged as an effective way to understand images
compositionally. These are graph-structured semantic representations of images
that contain objects, their attributes, and relations with other objects in a
scene. In this work, we consider the scene graph parsed from text as a proxy
for the image scene graph and propose a graph decomposition and augmentation
framework along with a coarse-to-fine contrastive learning objective between
images and text that aligns sentences of various complexities to the same
image. Along with this, we propose novel negative mining techniques in the
scene graph space for improving attribute binding and relation understanding.
Through extensive experiments, we demonstrate the effectiveness of our approach
that significantly improves attribute binding, relation understanding,
systematic generalization, and productivity on multiple recently proposed
benchmarks (For example, improvements upto for systematic
generalization, for relation understanding over a strong baseline),
while achieving similar or better performance than CLIP on various general
multimodal tasks.Comment: 16 pages, 12 figures, 7 Tables. Pre-prin
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
We present SpeechMatrix, a large-scale multilingual corpus of
speech-to-speech translations mined from real speech of European Parliament
recordings. It contains speech alignments in 136 language pairs with a total of
418 thousand hours of speech. To evaluate the quality of this parallel speech,
we train bilingual speech-to-speech translation models on mined data only and
establish extensive baseline results on EuroParl-ST, VoxPopuli and FLEURS test
sets. Enabled by the multilinguality of SpeechMatrix, we also explore
multilingual speech-to-speech translation, a topic which was addressed by few
other works. We also demonstrate that model pre-training and sparse scaling
using Mixture-of-Experts bring large gains to translation performance. The
mined data and models are freely available.Comment: 18 page